95 research outputs found

    Improved shrunken centroid classifiers for high-dimensional class-imbalanced data

    Get PDF
    BACKGROUND: PAM, a nearest shrunken centroid method (NSC), is a popular classification method for high-dimensional data. ALP and AHP are NSC algorithms that were proposed to improve upon PAM. The NSC methods base their classification rules on shrunken centroids; in practice the amount of shrinkage is estimated minimizing the overall cross-validated (CV) error rate. RESULTS: We show that when data are class-imbalanced the three NSC classifiers are biased towards the majority class. The bias is larger when the number of variables or class-imbalance is larger and/or the differences between classes are smaller. To diminish the class-imbalance problem of the NSC classifiers we propose to estimate the amount of shrinkage by maximizing the CV geometric mean of the class-specific predictive accuracies (g-means). CONCLUSIONS: The results obtained on simulated and real high-dimensional class-imbalanced data show that our approach outperforms the currently used strategy based on the minimization of the overall error rate when NSC classifiers are biased towards the majority class. The number of variables included in the NSC classifiers when using our approach is much smaller than with the original approach. This result is supported by experiments on simulated and real high-dimensional class-imbalanced data

    Class prediction for high-dimensional class-imbalanced data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The goal of class prediction studies is to develop rules to accurately predict the class membership of new samples. The rules are derived using the values of the variables available for each subject: the main characteristic of high-dimensional data is that the number of variables greatly exceeds the number of samples. Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced data often produce classifiers that do not accurately predict the minority class; the prediction is biased towards the majority class. In this paper we investigate if the high-dimensionality poses additional challenges when dealing with class-imbalanced prediction. We evaluate the performance of six types of classifiers on class-imbalanced data, using simulated data and a publicly available data set from a breast cancer gene-expression microarray study. We also investigate the effectiveness of some strategies that are available to overcome the effect of class imbalance.</p> <p>Results</p> <p>Our results show that the evaluated classifiers are highly sensitive to class imbalance and that variable selection introduces an additional bias towards classification into the majority class. Most new samples are assigned to the majority class from the training set, unless the difference between the classes is very large. As a consequence, the class-specific predictive accuracies differ considerably. When the class imbalance is not too severe, down-sizing and asymmetric bagging embedding variable selection work well, while over-sampling does not. Variable normalization can further worsen the performance of the classifiers.</p> <p>Conclusions</p> <p>Our results show that matching the prevalence of the classes in training and test set does not guarantee good performance of classifiers and that the problems related to classification with class-imbalanced data are exacerbated when dealing with high-dimensional data. Researchers using class-imbalanced data should be careful in assessing the predictive accuracy of the classifiers and, unless the class imbalance is mild, they should always use an appropriate method for dealing with the class imbalance problem.</p

    News

    Get PDF
    Detailed description of the classifiers. In the Additional file we provide a description of each classifier used in the paper. (PDF 201 kb

    Parameterized Link Functions in Generalized Linear Random Effect Models: a Case Study on Breast Cancer Treatment

    Get PDF
    In non-linear random effects some attention has been very recently devoted to the analysis ofsuitable transformation of the response variables separately (Taylor 1996) or not (Oberg and Davidian 2000) from the transformations of the covariates and, as far as we know, no investigation has been carried out on the choice of link function in such models. In our study we consider the use of a random effect model when a parameterized family of links (Aranda-Ordaz 1981, Prentice 1996, Pregibon 1980, Stukel 1988 and Czado 1997) is introduced. We point out the advantages and the drawbacks associated with the choice of this data-driven kind of modeling. Difficulties in the interpretation of regression parameters, and therefore in understanding the influence of covariates, as well as problems related to loss of efficiency of estimates and overfitting, are discussed. A case study on radiotherapy usage in breast cancer treatment is discussed

    As TIC como recurso para escola inclusiva

    Get PDF
    A Educação Especial no Brasil apresenta Leis representativas sobre a inclusão, mas ainda passíveis de muita discussão e compreensão tanto na análise interpretativa quanto na efetiva seguridade de seu cumprimento, a qual busca a promoção da interação do acesso e permanência dos educandos nas escolas. Investimento, qualificação, sensibilização e remuneração adequada norteiam o uso das ferramentas tecnológicas, indiscutivelmente necessárias para a educação contemporânea. Diante o exposto, o objetivo geral dessa investigação é assegurar a implementação das TIC – Tecnologia de Informação e Comunicação e sua influência no processo de ensino e aprendizagem dos alunos com NEE – Necessidades Educativas Especiais, especificamente na Escola Municipal Gumercindo Vicente Santana, localizada na cidade de Palminópolis, no estado de Goiás, Brasil e, consequentemente, ser referencial para as demais escolas interioranas. Há que se ressaltar que a resistência apresentada pelos docentes acerca da metodologia inovadora, pode ser relacionada ao antagonismo entre tradicional e moderno. Para tanto, utilizou-se como metodologia de investigação a pesquisa quantitativa, e como instrumento de coleta de dados, utilizou-se o inquérito por questionário. Debate-se como deve ser o acolhimento dos alunos com NEE nas escolas regulares e o papel do educador frente à inclusão. Atualmente, as TIC fazem a diferença no desenvolvimento dos alunos, com as devidas adaptações curriculares, preparando-os para o mundo crítico-construtivo, trabalhando suas particularidades, potencializando suas habilidades e promovendo a autonomia na construção do conhecimento, favorecendo, pois, uma observação mais seletiva. Esse estudo busca recursos para a escola inclusiva no distrito de Palminópolis - GO, implantando as TIC como metodologia pedagógica.Special Education in Brazil has representative laws on inclusion, but still subject to much discussion and understanding both in interpretive analysis and in the effective security of compliance, which seeks to promote the interaction of access and permanence of students in schools. Investment, qualification, awareness and adequate remuneration guide the use of technological tools, indisputably necessary for contemporary education. Given the above, the general objective of this investigation is to ensure the implementation of ICT - Information and Communication Technology and its influence on the teaching and learning process of students with SEN - Special Educational Needs, specifically at the Municipal School Gumercindo Vicente Santana, located in the city of Palminópolis, in the state of Goiás, Brazil and, consequently, be a reference for other schools in the interior. It should be noted that the resistance presented by the professors regarding the innovative methodology may be related to the antagonism between traditional and modern. For that, it was used as an investigation methodology the quantitative research, and as an instrument of data collection, it was used the inquiry by questionnaire. It debates how the reception of students with SEN should be in regular schools and the role of the educator facing inclusion. Currently, ICT make a difference in the development of students, with the necessary curricular adaptations, preparing them for the critical-constructive world, working on their particularities, enhancing their skills and promoting autonomy in the construction of knowledge, thus favoring an observation more selective. This study seeks resources for the inclusive school in the district of Palminópolis - GO, implementing ICT as a pedagogical methodology

    Impact of Pre-Existing Treatment with Statins on the Course and Outcome of Tick-Borne Encephalitis

    Get PDF
    OBJECTIVES: Although statins have anti-inflammatory and potentially also antimicrobial (including antiviral) activity, their therapeutic impact on infectious diseases is controversial. In this study, we evaluated whether pre-existing statin use influenced the course and outcome of tick-borne encephalitis. METHODS: To assess the influence of statin usage on the severity of acute illness and the outcome of tick-borne encephalitis, univariate and multivariable analyses were performed for 700 adult patients with tick-borne encephalitis of whom 77 (11%) were being treated with statins, and for 410 patients of whom 53 (13%) were receiving statins, respectively. RESULTS: Multivariable analyses found no statistically significant association between statin usage and having a milder acute illness. There was also no statistically significant benefit with respect to a favorable outcome defined by the absence of post-encephalitic syndrome (ORs for a favorable outcome at 6 months was 0.96, 95% CI: 0.46-2.04, P = 0.926; at 12 months 0.29, 95% CI: 0.06-1.33, P = 0.111; at 2-7 years after acute illness 0.44, 95% CI: 0.09-2.22, P = 0.321), by a reduction in the frequency of six nonspecific symptoms (fatigue, myalgia/arthralgia memory disturbances, headache, concentration disturbances, irritability) occurring during the 4 week period before the last examination, or by higher SF-36 scores in any of the eight separate domains of health as well as in the physical and mental global overall component. Furthermore, there were no significant differences between patients receiving statins and those who were not in the cerebrospinal fluid or serum levels for any of the 24 cytokines/chemokines measured. CONCLUSIONS: In this observational study, we could not prove that pre-existing use of statins affected either the severity of the acute illness or the long-term outcome of tick-borne encephalitis

    Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

    Get PDF
    International audienceBackground: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. Methods: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 “High-dimensional data” of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. Results: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. Conclusions: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses

    Restricted cubic splines for modelling periodic data.

    No full text
    In regression modelling the non-linear relationships between explanatory variables and outcome are often effectively modelled using restricted cubic splines (RCS). We focus on situations where the values of the outcome change periodically over time and we define an extension of RCS that considers periodicity by introducing numerical constraints. Practical examples include the estimation of seasonal variations, a common aim in virological research, or the study of hormonal fluctuations within menstrual cycle. Using real and simulated data with binary outcomes we show that periodic RCS can perform better than other methods proposed for periodic data. They greatly reduce the variability of the estimates obtained at the extremes of the period compared to cubic spline methods and require the estimation of fewer parameters; cosinor models perform similarly to the best cubic spline model and their estimates are generally less variable, but only if an appropriate number of harmonics is used. Periodic RCS provide a useful extension of RCS for periodic data when the assumption of equality of the outcome at the beginning and end of the period is scientifically sensible. The implementation of periodic RCS is freely available in peRiodiCS R package and the paper presents examples of their usage for the modelling of the seasonal occurrence of the viruses
    corecore